Scatterplots in pydeck: A case study using Beijing subway stops

Since 1978, China's GDP has nearly doubled every seven years [Wikipedia]. That sort of exponential growth has led to rapid internal change within the country—as demonstrated in part by the rapid changes within Beijing's urban infrastructure.

Below we'll plot the location of Beijing subway stops over time. Locations for subway stops come from Wikipedia and OpenStreetMap. This is not a rigorous study, so some subway stops may be missing.

Contents

Getting the data

First, we can use the Pandas library to download our data. You're likely already familiar with it–Pandas is a very popular library in Python for filtering, aggregating, and joining data.


In [ ]:
import pandas as pd
from pydeck import (
    data_utils,
    Deck,
    Layer
)

# First, let's use Pandas to download our data
URL = 'https://raw.githubusercontent.com/ajduberstein/data_sets/master/beijing_subway_station.csv'
df = pd.read_csv(URL)
df.head()

Data cleaning

Next, we'll have to engage in some necessary data housekeeping. The CSV encodes the [R, G, B, A] color values a str, and literal_eval lets us convert that string a list.


In [ ]:
from ast import literal_eval
# We have to re-code position to be one field in a list, so we'll do that here:
# The CSV encodes the [R, G, B, A] color values listed in it as a string
df['color'] = df.apply(lambda x: literal_eval(x['color']), axis=1)

Automatically generate a viewport

pydeck features some utilities for visualizing data, like an automatic zoom using data_utils.compute_view for 2D data sets.

We'll render the viewport, as well, just to verify that the visualization looks sensible.


In [ ]:
# Use pydeck's data_utils module to fit a viewport to the central 90% of the data
viewport = data_utils.compute_view(points=df[['lng', 'lat']], view_proportion=0.9)
auto_zoom_map = Deck(layers=[], initial_view_state=viewport)
auto_zoom_map.show()

Sure enough, we're centered to Beijing.

Plotting the data

We'll render the data and use some Jupyter notebook functionality to provide a header with a year.

It's worth spending some time on each line, if you haven't seen the Layer object yet:

scatterplot = Layer(
    'ScatterplotLayer',
    df,
    get_radius=500,
    get_fill_color='color',
    get_position='position')

We can specify the layer type as the first argument, the data as the second, and the layer arguments as keywords. ScatterplotLayer is one of a list of layers available in the deck.gl core library. We'll also provide a header to list the year using some built-in Jupyter notebook tools.

For a list of other layers, see the deck.gl documentation. Remember that deck.gl is a JavaScript library and not a Python one, so the documentation may differ for some kinds of terminology and functionality (e.g., pydeck doesn't support passing functions as arguments but this is a common occurrence within deck.gl).


In [ ]:
from IPython.core.display import display
import ipywidgets

year = 2019

scatterplot = Layer(
    'ScatterplotLayer',
    df,
    id='scatterplot-layer',
    get_radius=500,
    get_fill_color='color',
    get_position='[lng, lat]')
r = Deck(layers=[scatterplot], initial_view_state=viewport)

# Create an HTML header to display the year
display_el = ipywidgets.HTML('<h1>{}</h1>'.format(year))
display(display_el)
# Show the current visualization
r.show()

Playing the data forward in time

Finally, we can loop through the data and see the dramatic development in Beijing since 1971, as demonstrated by subway stop opening dates.


In [ ]:
import time
for y in range(1971, 2020):
    scatterplot.data = df[df['opening_date'] <= str(y)]
    year = y
    # Reset the header to display the year
    display_el.value = '<h1>{}</h1>'.format(year)
    r.update()
    time.sleep(0.2)